DMTCP: Bringing Checkpoint-Restart to Python

نویسندگان

  • Kapil Arya
  • Gene Cooperman
چکیده

DMTCP (Distributed MultiThreaded CheckPointing) is a mature checkpoint-restart package. It operates in user-space without kernel privilege, and adapts to application-specific requirements through plugins. While DMTCP has been able to checkpoint Python and IPython "from the outside" for many years, a Python module has recently been created to support DMTCP. IPython support is included through a new DMTCP plugin. A checkpoint can be requested interactively within a Python session, or under the control of a specific Python program. Further, the Python program can execute specific Python code prior to checkpoint, upon resuming (within the original process), and upon restarting (from a checkpoint image). Applications of DMTCP are demonstrated for: (i) Python-based graphics using VNC; (ii) a Fast/Slow technique to use multiple hosts or cores to check one Cython computation in parallel; and (iii) a reversible debugger, FReD, with a novel reverse-expression watchpoint feature for locating the cause of a bug.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generic Checkpoint-Restart Mechanism for Virtual Machines

It is common today to deploy complex software inside a virtual machine (VM). Snapshots provide rapid deployment, migration between hosts, dependability (fault tolerance), and security (insulating a guest VM from the host). Yet, for each virtual machine, the code for snapshots is laboriously developed on a per-VM basis. This work demonstrates a generic checkpoint-restart mechanism for virtual ma...

متن کامل

DMTCP: Scalable User-Level Transparent Checkpointing for Cluster Computations

As the size of clusters increases, failures are becoming increasingly frequent. Applications must become fault tolerant if they are to run for extended periods of time. We present DMTCP (Distributed MultiThreaded CheckPointing), the first user-level distributed checkpointing package not dependent on a specific message passing library. This contrasts with existing approaches either specific to l...

متن کامل

Temporal Debugging using URDB

A new style of temporal debugging is proposed. The new URDB debugger can employ such techniques as temporal search for finding an underlying fault that is causing a bug. This improves on the standard iterative debugging style, which iteratively re-executes a program under debugger control in the search for the underlying fault. URDB acts as a meta-debugger, with current support for four widely ...

متن کامل

Adapting the DMTCP Plugin Model for Checkpointing of Hardware Emulation

Checkpoint-restart is now a mature technology. It allows a user to save and later restore the state of a running process. The new plugin model for the upcoming version 3.0 of DMTCP (Distributed MultiThreaded Checkpointing) is described here. This plugin model allows a target application to disconnect from the hardware emulator at checkpoint time and then reconnect to a possibly different hardwa...

متن کامل

Temporal Debugging: Automating Time Travel Debugging with URDB

This work addresses two classical problems in debugging. First, while some excellent reversible debuggers have been built for C, C++, Java, Standard ML, other languages including MATLAB, Python and Perl lack such reversible debuggers. To solve this, this work contributes a new temporal debugging approach and a new software package, URDB (Universal Reversible DeBugger), which can extend the nati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013